最近的工作表明,大型审慎的语言模型(LMS)不仅可以在一系列自然语言处理(NLP)任务上表现出色,而且还可以开始改进推理任务,例如算术诱导,象征性操纵,并随着规模的增加而进行常识性推理。模型。但是,目前尚不清楚这些LMS的潜在能力是什么。令人惊讶的是,我们发现这些模型对某些基本的符号操纵任务有局限性,例如复制,反向和加法。当符号总数或重复符号增加时,模型性能会迅速下降。我们研究了这种现象背后的潜在原因,并检查了一组可能的方法,包括明确的位置标记,细粒度的计算步骤以及具有可呼出程序的LMS。实验结果表明,这些技术都无法完全解决最简单的添加感应问题。最后,我们向导师介绍LMS,这展示了每一个教学的步骤。 LMS带有导师的LMS能够在OOD和重复符号的情况下提供100%的精度,从而在诱导中对大型LMS边界产生新的见解。
translated by 谷歌翻译
变压器被认为是自2018年以来最重要的深度学习模型之一,部分原因是它建立了最先进的记录(SOTA)记录,并有可能取代现有的深神经网络(DNNS)。尽管取得了显着的胜利,但变压器模型的延长周转时间是公认的障碍。序列长度的多样性施加了其他计算开销,其中需要将输入零填充到批处理中的最大句子长度,以容纳并行计算平台。本文针对现场可编程的门阵列(FPGA),并提出了一个连贯的序列长度自适应算法 - 硬件与变压器加速度的共同设计。特别是,我们开发了一个适合硬件的稀疏注意操作员和长度意识的硬件资源调度算法。提出的稀疏注意操作员将基于注意力的模型的复杂性降低到线性复杂性,并减轻片外记忆流量。提出的长度感知资源硬件调度算法动态分配了硬件资源以填充管道插槽并消除了NLP任务的气泡。实验表明,与CPU和GPU实施相比,我们的设计准确度损失很小,并且具有80.2 $ \ times $和2.6 $ \ times $速度,并且比先进的GPU加速器高4 $ \ times $ $ $ \ times $通过Cublas Gemm优化。
translated by 谷歌翻译
统计物理学的最新进展显示了机器学习在识别阶段过渡时的显着性能。在本文中,我们基于转移学习施加域对抗性神经网络(DANN),以研究非平衡和平衡相变模型,分别是渗透模型和定向渗透(DP)模型。通过DANN,只需要标记一小部分输入配置(2D图像),以便自动选择,以便捕获临界点。要了解DP模型,该方法通过确定临界点的迭代过程来改进,这是计算临界指数$ \ nu _ {\ perp} $的数据崩溃的先决条件。然后,我们将DANN应用于二维站点的遗传筛选,该配置过滤以仅包括可能包含与订单参数相关的信息的最大群集。两种模型的DANN学习都会产生可靠的结果,它与来自蒙特卡罗模拟的结果相当。我们的研究还表明,与监督学习相比,Dann可以以更低的成本实现相当高的准确性。
translated by 谷歌翻译
Time series forecasting is an important problem across many domains, including predictions of solar plant energy output, electricity consumption, and traffic jam situation. In this paper, we propose to tackle such forecasting problem with Transformer [1]. Although impressed by its performance in our preliminary study, we found its two major weaknesses: (1) locality-agnostics: the point-wise dotproduct self-attention in canonical Transformer architecture is insensitive to local context, which can make the model prone to anomalies in time series; (2) memory bottleneck: space complexity of canonical Transformer grows quadratically with sequence length L, making directly modeling long time series infeasible. In order to solve these two issues, we first propose convolutional self-attention by producing queries and keys with causal convolution so that local context can be better incorporated into attention mechanism. Then, we propose LogSparse Transformer with only O(L(log L) 2 ) memory cost, improving forecasting accuracy for time series with fine granularity and strong long-term dependencies under constrained memory budget. Our experiments on both synthetic data and realworld datasets show that it compares favorably to the state-of-the-art.
translated by 谷歌翻译
This supplementary paper aims to introduce the Multidimensional Service Quality Scoring System (MSQs), a review-based method for quantifying host service quality mentioned and employed in the paper Exit and transition: Exploring the survival status of Airbnb listings in a time of professionalization. MSQs is not an end-to-end implementation and is essentially composed of three pipelines, namely Data Collection and Preprocessing, Objects Recognition and Grouping, and Aspect-based Service Scoring. Using the study mentioned above as a case, the technical details of MSQs are explained in this article.
translated by 谷歌翻译
紧张的机器人由刚性杆和柔性电缆组成,表现出高强度对重的比率和极端变形,使它们能够驾驭非结构化的地形,甚至可以在严酷的冲击力上生存。但是,由于其高维,复杂的动态和耦合体系结构,它们很难控制。基于物理学的仿真是制定运动策略的途径,然后可以将其转移到真实的机器人中,但是建模时态机器人是一项复杂的任务,因此模拟会经历大量的SIM2REAL间隙。为了解决这个问题,本文介绍了台词机器人的真实2SIM2REAL策略。该策略是基于差异物理引擎的,可以在真正的机器人(即离线测量和一个随机轨迹)中进行有限的数据进行训练,并达到足够高的精度以发现可转移的运动策略。除了整体管道之外,这项工作的主要贡献包括在接触点处计算非零梯度,损失函数和轨迹分割技术,该技术避免了训练期间梯度评估的冲突。在实际的3杆张力机器人上证明并评估了所提出的管道。
translated by 谷歌翻译
这项工作考虑了在属性关系图(ARG)上表示表示的任务。 ARG中的节点和边缘都与属性/功能相关联,允许ARG编码在实际应用中广泛观察到的丰富结构信息。现有的图形神经网络提供了有限的能力,可以在局部结构环境中捕获复杂的相互作用,从而阻碍他们利用ARG的表达能力。我们提出了Motif卷积模块(MCM),这是一种新的基于基线的图表表示技术,以更好地利用本地结构信息。处理连续边缘和节点功能的能力是MCM比现有基于基础图案的模型的优势之一。 MCM以无监督的方式构建了一个主题词汇,并部署了一种新型的主题卷积操作,以提取单个节点的局部结构上下文,然后将其用于通过多层perceptron学习高级节点表示,并在图神经网络中传递消息。与其他图形学习方法进行分类的合成图相比,我们的方法在捕获结构环境方面要好得多。我们还通过将其应用于几个分子基准来证明我们方法的性能和解释性优势。
translated by 谷歌翻译
学习普遍面孔表示的最佳方法是什么?在面部分析领域进行深度学习的最新工作集中在监督方面的学习特定任务(例如面部识别,面部地标本地化等),但忽略了如何找到可以轻松适应面部表征的总体问题到几个面部分析任务和数据集。为此,我们做出以下4个贡献:(a)我们首次介绍面部表示学习的全面评估基准,该基准由5个重要​​的面部分析任务组成。 (b)我们系统地研究了应用于面孔的大规模表示学习的两种方式:受监督和无监督的预训练。重要的是,我们将评估重点放在几乎没有面部学习的情况下。 (c)我们研究了培训数据集的重要特性,包括其大小和质量(标记,未标记甚至未经保育)。 (d)为了得出结论,我们进行了大量实验。我们的主要两个发现是:(1)完全在野外的未经监督的预培训,未经保育的数据提供了一致的,在某些情况下,对所有面部任务进行了显着准确的改进。 (2)许多现有的面部视频数据集似乎具有大量冗余。我们将发布代码和预先培训的模型,以促进未来的研究。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译